After-meal blood glucose level prediction for type-2 diabetic patients

Type 2 Diabetes, a metabolic disorder disease, is becoming a fast growing health crisis worldwide. It reduces the quality of life, and increases mortality and health care costs unless managed well. After-meal blood glucose level measure is considered as one of the most fundamental and well-recognized steps in managing Type 2 diabetes as it guides a user to make better plans of their diet and thus control the diabetes well. In this paper, we propose a data-driven approach to predict the 2 h after meal blood glucose level from the previous discrete blood glucose readings, meal, exercise, medication, & profile information of Type 2 diabetes patients. To the best of our knowledge, this is the first attempt to use discrete blood glucose readings for 2 h after meal blood glucose level prediction using data-driven models. In this study, we have collected data from five prediabetic and diabetic patients in free living conditions for six months. We have presented comparative experimental study using different popular machine learning models including support vector regression, random forest, and extreme gradient boosting, and two deep layer techniques: multilayer perceptron, and convolutional neural network. We present also the impact of different features in blood glucose level prediction, where we observe that meal has some modest and medication has a good influence on blood glucose level.


Introduction
According to Diabetes Atlas 2021 [1], Diabetes, a chronic metabolic disorder disease, is the fastest growing health emergencies in the twenty-first century.In 2021, approximately 537 million adults (20-79 years) were living with diabetes, and nearly 7 million diabetes related death had been reported.The number of people with diabetes is projected to rise to 643 million by 2030 and 783 million by 2045.Thus, researchers, health workers, and policy makers take different initiatives to combat diabetes.As this disease cannot be fully cured, if the disease is not controlled then a diabetic patient will have long-term diabetes related complications such as cardiovascular disease, kidney failure, diabetic retinopathy, etc.Thus, one of the key focuses of the practitioners in this area is to manage diabetes through a controlled life-style (e.g., Ref. [2]).
The main goal of a diabetes management system is to keep the fasting Blood Glucose Level (BGL) within the euglycemic (normal) range of 70 mg/dL and 126 mg/dL before meals and under 180 mg/dL 2 h after meals.Meal-intake has a direct influence on BGL, in particular, the nutrient Carbohydrate (CHO) (or simply carb) in a meal increases the BGL [3], and protein, fat, & fiber part of the meal slow down the meal consumption process.Hence, the nutritional ingredients of a meal play an important role in determining the BGL.Other factors that influence the BGL include insulin or medicine, physical exercise, etc. Insulin or diabetes medicine reduces the BGL to offset the glucose hike due to meal intake.On the other hand, physical exercise reduces BGL by burning energy.In this research, we propose a machine-learning based system to predict the 2 h after meal BGL of Type 2 diabetes patient, taking the meal intake, insulin or medicine, physical exercises, etc. into account.
There are a number of key benefits for predicting BGL 2 h after a meal.If we could predict the BGL 2 h after a meal before eating that meal, a diabetic patient could adjust his/her diet.Also s/he could adjust the intensity and duration of his/her physical exercise, and could adjust his/her insulin dosage amount accordingly to have a euglycemic state 2 h after meal.It is important to note that knowing 2 h after meal BGL also helps patients avoid life threatening abnormal conditions, such as hyperglycemia and hypoglycemia.It should be noted that hyperglycemia is a condition when the BGL is above 126 mg/dL during fasting and above 180 mg/dL 2 h after a meal, and hypoglycemia is a condition when the BGL drops below 70 mg/dL.
With growing urgency to control diabetes, there have been growing research efforts in the BGL prediction area for both Type 1 Diabetes Mellitus (T1DM) and Type 2 DM (T2DM) diabetes [4].Most of the existing research focus on BGL prediction for T1DM diabetes (the type of diabetes due to autoimmune system disorder) (e.g., Refs.[5][6][7]).Many T1DM patients use Continuous Glucose Monitoring (CGM) devices to track BGLs.CGM devices provide a BGL reading every 5 min totaling 288 readings per day.This high frequency data generation motivates the application of data-driven techniques including Deep Learning (DL) based predictive models (e.g., Refs.[7][8][9][10][11]).Data collected using CGM devices are time-series data.On the contrary, T2DM patients usually measure BGL values using a finger stick test with a personal blood glucose meter.Data collected using finger stick tests are discrete data.At the upper end, a T2DM patient may have one or two data points a day.Hence, the time-series models that are developed using CGM time-series data of T1DM patients cannot be readily used to predict BGL for T2DM patients.
Few research works focus on BGL prediction for T2DM patients [9,[12][13][14][15].Faruqui et al. [9] have collected 6 months of data of 10 T2DM patients to forecast next day BGL using deep learning models.Karim et al. [16] have collected three weeks of four T2DM and one T1DM patients' data to use as input to a Neural Network model.Deng et al. [17] have collected CGM data from 40 T2DM patients.Their best performing model used the publicly available OhioT1DM [18] dataset.Gyuk et al. [13] have collected CGM, meal, insulin, and medication data of 26 T2DM patients for three weeks.Yang et al. [14] have collected CGM readings of 100 T1DM and T2DM patients for 4 days.Most of the above mentioned works [12][13][14] have collected T2DM patients' CGM data.Faruqui et al. in Ref. [9] and Albers et al. in Ref. [15] are the only two research works that have collected discrete T2DM data from glucometers.Faruqui et al. [9] have treated the daily glucose data as time series data.Each day glucose value is used as one data point.Hence, their focus is to predict the next day BGL, which is a different goal from ours.Albers et al. [15] have used two physiological models to show that even with sparse glucometer data it is possible to produce well enough personalized models.
The key challenges of predicting 2 h after meal BGL prediction is the scarcity of data that captures different factors affecting the diabetes in free living conditions.Also, there are a large number of factors affecting the 2 h after meal blood glucose, which may vary person to person.
To overcome the above challenges, in this study, we have collected discrete fingertip BGL values, diabetes medication or insulin dosage, physical activity, food value of meal intake, demographic in-formation, and clinical profile of five T2DM patients in free living condition for six months.We have implemented a smart device app to collect those data.We have annotated the meal intake data with appropriate nutrition food value with the help of nutrition experts.There were 23 features in the model ready dataset.Finally, we present that the data-driven models can forecast 2-hr postprandial BGLs in T2DM patients leveraging discrete BGL readings, diabetes medication doses, physical activity details, meal nutritional values, and profile information with a Root Mean Square Error (RMSE) of 31.87 mg/dL, which is an acceptable error for real-life applications of BGL predictions [16,19,20].
We list the major contributions of our research as follows.
• We have prepared a comprehensive dataset consisting of 900+ events collected over six months from 5 T2DM patients in free living condition.• We have collected the meal image (then convert these into corresponding food nutritional values), information about exercise, medicine, etc. for each collection event as these are the primary factors influencing the BGL.• We have provided an effective proof of concept of a novel comprehensive BGL prediction frame-work.In this framework, a data preprocessing step is implemented to ensure data quality, importance of different features are identified, influence of feature combinations are examined, and finally the first data driven system to predict 2 h after meal BGL values from discrete fingertip BGL and other values of T2DM patients with the state-of-the-art prediction performance is presented.• We have presented a comprehensive evaluation of our proposed system with a number of traditional machine learning (ML) and deep learning (DL) techniques, and a detailed analysis of the results.
The remaining sections of the paper are organized as follows.Section 2 reviews the existing literature on BGLP.Section 3 discusses the BGLP framework, data collection process, data preprocessing, and the ML techniques that have been used in this research.Section 4 outlines the experiments that are conducted, the evaluation metrics that are used, and the hyperparameters that are selected.Section 5 discusses the results of the experiments.Section 6 presents the discussion, the limitations of this study, and the potential avenues for future research.Finally, Section 7 provides the summary of the research.

Related works
Most of the existing research have focused on BGL prediction that have reported performance results on 30-min Prediction Horizon (PH), quite a few on 60-min PH, and only a few on 2-hr PH.For a diabetic patient, the fasting BGL and the 2 h after meal BGL are two important metrics that are usually observed to manage diabetes well all over the world [21].Due to its high importance, we focus on 2 h after meal BGL prediction, i.e. the 2-hr PH.We will use the Root Mean Square Error (RMSE), a statistical regression metric that provides the precision level as given in Equation.
2 and Clarke's Error Grid Analysis (CEG) [22], a clinical regression metric that provides the clinical applicability level for model performance comparison.CEG has five regions: A, B, C, D, and E. Higher values in the A and B regions mean more correct predictions and are better.Lower values for C, D, and E regions mean less incorrect predictions and are better.

BGL prediction of type 1 diabetes patients
It is observed in the literature that the history of BGLs is very effective in predicting future BGLs.Therefore, it is the mostly used input feature in BGL prediction.Though over 90% of the diabetic patients have Type 2 diabetes, majority of the research use timeseries of CGM readings of T1DM patients as observed from Table 1.This is mostly due to the high volume of data a CGM device can produce per day.Three types of BGL prediction models: Physiological models, Data Driven models, and Hybrid models are mostly used by the researchers for BGL prediction of T1DM patients.We now discuss in details as follows.

Physiological models
Among BGL prediction models, physiological models are the least used due to the complexity and uniqueness of individual biological systems, and it needs expert domain knowledge.Among the research works [20,23] that have used physiological models, the work by Liu et al. [20] has performed the best with RMSE of 40.46 mg/dL and a 95.27% correct prediction in the A + B regions for a 2-hr PH.Their compartmental composite model of glucose-insulin dynamics uses a deconvolution technique on the CGM values.

Data-driven models
Majority of the research efforts have used data-driven models: traditional Machine Learning (ML) models and Deep Learning (DL) models.Frandes et al. [24] have used 4-7 days of CGM readings and time of different events of 17 T1DM patients as input on their proposed nonlinear Auto-Regressive Neural Network (AR-NN).They have achieved an exceptional performance with an RMSE of 2.37 mg/dL for 30-min PH.Georga et al. in Ref. [25] have investigated the use of Extreme Learning Machines which are in plain eyes Feedforward Neural Networks (FNN) on 5-22 days of glucose, insulin, exercise, and meal data of 15 T1DM patients.They have achieved an RMSE value of 6.10 mg/dL for.
30-min PH.There are other works that have used FNN, Self-Organizing Map (SOM), Wavelet Fuzzy NN (WFNN), Linear Regression (LR) [11], Recursive least squares (RLS), Support Vector Regression (SVR) with differential evolution [26], Random Forest (RF) [27] and have achieved RMSEs of 11. 42  Another group [10] have used 4 days of glucose, insulin (both basal and bolus), and meal data of 9 T1DM patients from the D1namo dataset [29] as input.They have applied a different Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) [30] layer for each of the four input features separately followed by a dense layer.All the outputs were then concatenated to get the final output.This model has achieved an RMSE of 6.42 mg/dL for 30-min PH.Gu et al. [31] have taken the same approach except that they have used CNN on individual features, concatenated the outputs, and finally have used the LSTM model to get the final output.They have used their model on a publicly available dataset on tidepool.orgthat has 102-268 days of glucose, insulin, and meal data of 34 T1DM patients and received an RMSE of 9.18 mg/dL for 30-min PH.
Aliberti et al. [8] have used LSTM model on a large dataset of 451 T1DM patients and have received an average RMSE value of 5.93 mg/dL for 30-min PH.Wang et al. [39] have used a model with stacked LSTM layers on 878k data points of twenty T1DM patients and received an RMSE of 11.96 mg/dL for 30-min PH.Zhu et al. [40] have used Dilated Recurrent Neural Networks (DRNN) with vanilla RNN cells on the OhioT1DM dataset and received an RMSE of 18.9 mg/dL on 30-min.
PH.In another work, Zhu et al. [41] have used an attention-based bidirectional RNN (BiRNN) model with confidence provided by evidential regression on 8 weeks of CGM, insulin, and meal data of 12.
T1DM patients to compute personalized BGL prediction and have received an improved RMSE of.18.64 mg/dL.Yang et al. [42] have proposed an autonomous channel DL framework with adaptive input selection capability on the same dataset for BGL prediction.It appears that SVR, MLP, CNN, and LSTMs are being frequently used by the researchers with good performance.

Hybrid models
Among the hybrid models, the work by Georga et al. [43] have used insulin absorption, kinetics, and dynamics model to simulate plasma insulin concentration and glucose absorption model to simulate glucose plasma appearance rate.In addition to these, hour of the day, and activity energy are used as input features to the SVR BGL prediction model.Their best prediction model yields a state of the art performance with a RMSE value of 6.03 mg/dL for 30-min PH and 7.62 mg/dL for 2-hr PH.Another work by Karim et al. [16] have used the glucose absorption curve from a physiological model [45], insulin dosage amount, and CGM readings as input to a FNN with 20 hidden layers and achieved an RMSE of 31.5 mg/dL for 2-hr PH.In another follow-up work in Ref. [46], in addition to the previous inputs the researchers have added the total area under the absorption curve, and basal insulin dosage time, and have found an improved RMSE of 30.92 mg/dL.

BGL prediction of type 2 diabetes patients
Only a few [9,[13][14][15][16] of the relevant research on BGL prediction have used T2DM patients' data that we already have mentioned in Section 1.Among these works, all except [9,15] have used CGM readings of T2DM patients.Faruqui et al. [9] have used discrete BGLs of T2DM patients.However, they have used that data to predict the next day's BGL.The research work by Albers et al. [15] is the only attempt where they have used physiological models to predict 2 h after meal BGLs using discrete BGLs of T2DM patients.Hence, we observe that the majority of the previous studies have used CGM data.There is a lack of research effort using the discrete BGL values of T2DM patients.Therefore, in this work we have proposed a data-driven system to predict BGL values 2 h after meal from discrete fingertip BGL and other values of T2DM patients.

BGL prediction framework
Our proposed BGL prediction framework is presented in Fig. 1.The data collection process is the first and the most important task of this work.The data collection process is described in Subsection 3.1.Data is collected from five (5) T2DM patients in free living condition.The collected data is usually in raw form.They need to be preprocessed before they can be used as input data by any model.The data preprocess step, described in Subsection 3.2 provides the model ready dataset.We have conducted experiments utilizing Principal Component Analysis (PCA), a feature selection strategy, to explore its potential contributions.The model ready dataset is used to train and test the potential models to find the optimized prediction model.The models are described in Subsection 3.4.The optimized model will be loaded to a smart device.When the user takes a meal image just before eating a meal, the loaded model will use that and other concurrently collected features to predict the BGL after 2 h of meal intake.This will help a diabetic patient to adjust her diet or lifestyle instantaneously.

Data collection process
In this subsection, we discuss the details of the data collection process using the DiabetesPal smart app designed by us.The user profiles, used glucometers, etc. are also discussed.

User profiles
In this step, the data has been collected for six months from five Type 2 diabetic and prediabetic patients in the range of 46-82 years of age in free living conditions.Written Informed Consents have been taken from all the patients as a standard practice of data collection.Appropriate approval related to data collection involving human subjects is taken from Bangladesh University of Engineering and Technology (BUET), reference no: Estt./Ta-4/140548/R-3617, dated 04/12/22.User profiles are given in Table 2.Among them, three are male patients and two are female patients.They have various lengths of diabetes history.The patients have taken a blood glucose reading just before a meal, and another reading after 2 h of the meal intake.They have used two different glucometers: Accu-Chek Performa, and One Touch Verio Flex.The patients have taken images of the meal using their personal smartphone.They have taken this data for the same meal of the day that is convenient for them for six months living a normal life at home.

Data collection using the "DiabetesPal" smart app
A smartphone app "DiabetesPal" has been developed (shown in Fig. 2).The patients have used the app to record meal image, two BGL readings: one just before and one 2 h after meal, and all the physical activities data each day.For physical activity, they have entered the activity type, time, and duration.The patients are initially required to create an account on the app.At that account creation or at a later time, the patients record diabetes oral medication or insulin dosage, clinical profiles, and demographic information.Profile and medication information are editable in the app throughout the data collection step.

Data preprocessing
Daily data is directly uploaded to the cloud for storage from the app.All this data has gone through data preprocessing.All of the meal images are manually inspected by a nutrition expert to estimate the nutritional values.For each meal image, the serving size of each ingredient is eye estimated.The Carbohydrate (CHO), protein, fat, fiber, and calories information per serving of each ingredient is taken from Refs.[47,48].The Glycemic Index (GI) information has been taken from Ref. [49].The total CHO, protein, fat, fiber, and calorie of each meal is found by adding the values of all the ingredients.Glycemic Load (GL) is used in place of GI.GL also considers the amount of the ingredient in grams along with its GI [50].To find the Glycemic Load (GL) value Equation (1) given below is used.Moreover, the nutrition expert has ensured that the differences among the macronutrient composition of the meals for different patients in the free-living conditions are negligible.
Patients have manually recorded all information related to physical activities like exercise time, type, and duration in the app (shown in Fig. 2).Physical activities can lower BGL as long as 10 to 24 h after the activity by making body cells sensitive to insulin [51].Hence, any exercise within the 24-hr window before the 2 h BGL reading after meal can be considered.To include the more considerable effect, we have used a shorter 6-hr window.In our experiments we have included only the immediate exercise data within the 6-hr window.We have used the duration of exercise in minutes as the only exercise feature.In Ref. [19] by Kushner et al., they have used intermittent heart rate and step count data.They have found that these data do not have significant influence on BGLs.Xie et al. [35] have demonstrated that there is no added benefit for adding heart rate as a feature.Yet, we have plans to collect and include the step count, calories burned, heart rate, and quality of sleep as input features to further investigate their influence on future BGLs in our future work.
Patients have manually recorded their medication information: name of medicine, dosage amount, and frequency every time their prescriptions change.Any diabetes medicine taken by the patients can be categorized into the following nine categories: Metformin, Sulfonylurea, Pioglitazone, DPP-4-Inhibitor, GLP-1-RA, SGLT2-Inhibitor, Basal Insulin, Prandial Insulin, and Mixed Insulin [52].From these categories we get nine features for our models.The values of the features are the dosage amounts.After extraction of data from the source, we get six (6) demographic features: Gender, Height, Weight, Family History, Duration, and Age; one (1) exercise feature: duration; nine (9) diabetes medicine features: Metformin, Sulfonylurea, Pioglitazone, DPP-4-Inhibitor, GLP-1-RA, SGLT2-Inhibitor, BasalInsulin, PrandialInsulin, and MixedInsulin; six (6) meal features: CHO, Protein, Fat, Fiber, Calorie, and GL; and one (1) BGL reading feature: BGBefore.In total we have 23 features.One sample data is given in Table 3.
Through frequent communication with the subjects, we have observed that over 95% of the cases they have adhered to medication.We have identified the rest of the 5% cases and have collected the correct dosage amount.We have also observed that they haven't missed to report their physical activity information.Any sample data with missing meal image, and missing BGL reading (before or after) is removed.Categorical features are converted to numerical features with one-hot encoding.All of the other numerical features are normalized with the min-max normalization techniques to avoid biases from features with large values.

Feature selection
We employ a feature selection strategy to pick out the most impactful features from a multitude of options.Specifically, we integrate Principal Component Analysis (PCA) [53] as our chosen dimensionality reduction technique.The aim is to investigate how PCA can contribute to creating more straightforward models, accelerating computational processes, and enhancing the ability to generalize to novel, unseen data.The results are discussed in Subsection 5.6.

Models
Due to the need of personalized model for each patient, and expert domain knowledge by the physiological models and high frequency data generation by the CGM devices, data-driven models have attracted a lot of attention from the research community.As discussed in Section 2, majority of the studies have focused on the Type 1 diabetes and have used the CGM time-series data.T2DM patients' discrete BGL data has not been used by many previous works.In this paper, we have used and compared the performances of few data-driven models including several machine learning regression models: Support Vector Regression (SVR), Random Forest (RF), and eXtreme Gradient Boosting (XGB), CNN model, and MLP model.We now discuss them in brief as follows.

Random forest (RF)
Random Forest [54] uses a number of decision trees created on various subsets of features randomly selected from the original feature set.Sometimes, more randomness is added by selecting a random threshold for each feature.For regression, it takes the average of the predictions from all the trees (as shown in Fig. 3a).The more the number of decision trees, the better the accuracy, and the better RF handles the overfitting.

Extreme gradient boosting (XGBoost)
XGBoost [55] is also a decision tree based sequential learning algorithm that starts with an initial model.The average of all the target values can be used as the predicted value for the initial model.This initial model will have residuals.It then creates subsequent models, where each subsequent model uses a shallow decision tree to fit the residuals from the previous model to correct the errors.Models are added until a fixed number of trees are created, or an acceptable performance is reached, or no more performance improvement is received.The final prediction is the sum of the base model prediction and the weighted sum of all the subsequent models' predictions (as shown in Fig. 3b).It is a fast and outstanding performing technique for regression problems with structured or tabular data.

Support vector regression (SVR)
Support Vector Regression [56] uses the same principles of Support Vector Machine (SVM).SVM is a binary classifier that tries to find a line or hyperplane to separate the two classes well with a max margin between the two decision boundaries.The data points from the two classes are beyond the two decision boundaries.Initially, SVR tries to find a line or hyperplane to separate the two classes well with a good enough margin to have almost all the data points inside the decision boundaries.Then, like linear regression, it finds the best fit hyperplane that best fits the data points inside the decision boundaries (as shown in Fig. 3c).

Multi-layer perceptron (MLP)
Multi-layer Percenptrons [57] are deep feedforward Artificial Neural Networks (ANNs) where the information flows in one direction from input layer through intermediate hidden computational layers to output layer (as shown in Fig. 4a).All the nodes in one layer are connected with a certain weight to all the nodes in the subsequent layers.All the nodes except the input nodes usually use nonlinear activation functions to control the flow of information to the subsequent layers.These layers are also called Fully Connected (FC) dense layers.MLPs use backpropagation technique where error in the output is used to improve the network weights.MLPs use training data to train a model to later use it for prediction for new data.

Convolutional neural network (CNN)
Convolutional Neural Networks (CNNs) [58] are ANNs that include convolutional and pooling layers in addition to FC dense layers (as shown in Fig. 4b).CNNs are popularly used for image and speech signal classification and recognition.For a two dimensional (2-D) image, a convolution layer uses a 2-D, for example 3 × 3 kernel filter, which is a 2-D array of weights, to sweep over the entire image starting from top-left corner.Then it finds the dot product of that portion of image and the filter, and slides (shift) right by a few pixels, and repeats this process.The final output is a feature map consisting of the dot product values.The pooling layers, like convolution layers, also sweeps across the image with an aggregation function to reduce dimensions and overfitting, improve computational efficiency, and catch global context.

Experiments
We need to evaluate the candidate models correctly to have a good BGL prediction model.We have applied different ML models as mentioned in Subsection 3.4.Each of the sample data has 23.features as mentioned in Subsection 3.2.We have conducted a number of experiments to evaluate the models as follows.
Firstly, we have run experiments to find the relationship between the input features and the predicted BGL.We have computed the individual amount of influence of each input feature using information theory concept first.Then, we have used incremental combinations of input features as shown in Table 4 to find the influence of additional information and the best possible combination of features.Here, we have identified our best prediction models.Following this, Principal Component Analysis (PCA) was applied to evaluate its contributions to model performance.The goodness of fit for all models was assessed using R-squared values.Then, we have used one different user's data for testing and other users' data for training.This data separation using different patients is chosen to observe whether the models can generalize.The experimental results are listed in Table 8.The models are then applied to individual patient's data to generate the personalized models.Various split percentages for training and testing data are explored to see whether the optimal split match the already existing consensus.They are listed in Table 10.Additionally, Pearson Coefficients are computed to find the correlation between individual input features and BGL as given in Table 4.This will help to find the most contributing features.Finally, computational and model complexity is computed and compared with previous studies.
For the assessment of all our models, we have split the full dataset into 90% for training data and 10% for testing data.To effectively capitalize our small data set, we have used k-fold cross-validation on the training dataset instead of another one time split into training and validation set of the training dataset.We have tried from 3 folds to 8 folds.Three (3) cross validation folds is found to perform better than others.
The dataset is a 902 × 23 matrix, where 902 is the number of samples, and 23 is the number of input features.For the MLP and CNN models, the first hidden layer takes the 902 × 23 matrix as input.

Evaluation metrics
BGL prediction performance of different models are evaluated in terms of Root Mean Square.Error (RMSE) as defined in Equation ( 2), Clarke's Error Grid Analysis (CEG), R-squared (R2), and.Adjusted R-squared (R 2 adj ) Root Mean Square Error (RMSE) is a popular statistical metric in regression analysis.It provides the prediction error of the models.It is widely used and will help us to compare our results with the similar previous studies.Every experiment is run for five times and the average RMSE value is computed and reported here.
where y i are the actual BGL values, ŷi are the predicted BGL values, y is the mean of the actual BGL values, n is the number of data samples, and k is the number of independent variables in the model.Clarke's Error Grid Analysis (CEG) [22] is a clinical regression metric that provides the clinical suitability of a model.CEG has five regions: A, B, C, D, and E. Region A includes correct predictions, while Region B comprises predictions that, while not entirely accurate, do not result in incorrect treatments.Region C consists of predictions that may lead to unnecessary treatments.Region D represents potentially high-risk failures to detect hypo-or hyperglycemic events.Region E includes predictions that may cause confusion in the treatment of hypo-or hyperglycemia.
R-squared (R2) (given in Equation ( 3)) and Adjusted R-Squared (R 2 adj ) (given in Equation ( 4)) are statistical regression metrics.They are more popularly used to measure the goodness of fit of a model.R2 is also called the coefficient of determination and normally ranges from 0 to 1. R 2 adj is used to include the number of parameters, or coefficients, or vectors, or trees to curb the influence of overfitting.It ranges from negative infinity to 1.For both of them, the closer their values to 1 the more the independent variables can explain the variance in the dependent variables.

Hyper-parameter selection
For the hyper-parameter selection of all the models in this study, we have used GridSearch [59].The optimal number of principal components for PCA is determined to be 5.For SVR, the relevant hyperparameters have included the kernel, regularization parameter (C), kernel coefficient (gamma), and epsilon.Due to the extensive hyperparameter space, we have optimized them separately, resulting in optimal values: kernel (rbf), C (2), gamma (scale), and epsilon (0.05).
Random forest, on the other hand, have hyperparameters such as the number of trees, maximum depth of trees, minimum samples split, minimum samples leaf, and maximum features.The optimized values are found to be: n estimators (250), max depth (5), min samples split (35), min samples leaf (2), and max features ('auto').For XGBoost, hyperparameters have encompassed learning rate, num-ber of trees, maximum depth of a tree, minimum sum of instance weight needed in a child, subsample, and colsample bytree.The optimal hyperparameters are determined as follows: learning rate (0.01), n estimators (175), max depth (4), min child weight (1), subsample (0.7), and colsample bytree (0.6).

Assessing the impact of input features on BGL prediction
Different input features have different levels of influences on BGL.Here, we present the analysis on the influence of individual input features, their combinations, and meal macronutrients.To find the most useful features we have computed the Mutual Information (MI) [60], which measures the relationship in terms of uncertainty between a feature and the target variable, for all the input features (shown in Fig. 5).Due to the limited number of subjects in our dataset, the profile features: age, weight, height, diabetes duration, family history and medication features have displayed strong relationship as expected.It is observed that calorie and exercise duration have moderate relationship with BGL.

Comparative analysis of feature combinations in BGL prediction models
We have used different combinations of input features to find the best possible combination of features.The results are shown in Table 4.The performances of the five models are given on separate columns.To observe the influence of additional information, we have incrementally added features starting from glucose (Row 1), then meal & exercise separately (Rows 2 & 3), then three features with both meal & exercise (Row 4), then medicine (Row 5), and finally profile (Row 6).We observe that with each additional feature, the performance has been improved.We observe comparing Rows 1 and.
2, and Rows 3 and 4, that meal information improves performance.Comparing Rows 4 and 5, we observe that medicine improves the performance significantly as expected.
It is also observed that the traditional ML technique random forest provides the best performance with an average RMSE of 40.15 mg/dL over all feature sets.This is because random forest uses decision trees with minimal overfitting.Decision trees usually map input features to output better than other computational methods specially from simple datasets like ours.Therefore, for our carefully designed dataset random forest performs better to map the input features to BGL than other models.The CNN model has come third with an average RMSE of 41.72 mg/dL over all feature sets.It is a good performance considering the limited number of samples.The optimal performance is observed when using glucose, meal, exercise, medicine, and profile as the feature set, employing the random forest model, resulting in an RMSE value of 31.87 mg/dL.Notably, this achievement closely aligns with the top-performing research outlined in Refs.[11,16].It's noteworthy that a significant portion of existing research concentrates on predicting short-term horizons, typically around 30 min.This is due to the inherent challenges associated with forecasting blood glucose levels (BGL) over longer intervals, which involve a more extensive array of influencing factors.This is further exemplified in the work by Zarkogianni et al. [11], a prediction model based on Self-Organizing Maps (SOM) and trained on 6 days of CGM readings, changes in CGM readings, and physical activity data from 10.
T1DM patients, that yields an RMSE of 31 mg/dL for a 2-hr PH.Similarly, Karim et al. [16] have proposed a hybrid model trained on 3 weeks of meal, insulin, and CGM readings data from 5 diabetic patients, achieving an RMSE of 31.5 mg/dL for a 2-hr PH.These studies have used continuous CGM data.These comparative benchmarks underscore the competitiveness of our proposed model using discrete data points in the realm of 2-hr BGL prediction.
Moreover, we have computed the CEG prediction percentages as given in Table 5. is observed that both RFR with a prediction percentage of 98.79% and XGB with a prediction percentage of.99.11% have performed better than or close to the majority of the existing research works [16,19,20,32] in the A + B prediction regions for a 2-hr PH.For instance, Velasco et al. [32] have utilized a dynamic time warping algorithm to segregate the data, have employed Markov Chain Monte Carlo sampling for data augmentation, have trained a Grammatical Evolution (GE) model, and have finally achieved a 99.56% prediction percentage in the A + B regions.Liu et al. [20] have applied a physiological model to two weeks of data from 10 T1DM patients, attaining a 95.27% prediction percentage in the A + B regions.Kushner et al. [19] have used a feed-forward neural network on a dataset of 24 T1DM patients, achieving a 94% correct BGL prediction percentage in the A + B regions.Karim et al. [16] have reported a 98.13% correct BGL prediction percentage in the A + B regions.Among the RFR and XGB models, it is notable that the XGB model has demonstrated superior capability in avoiding potentially hazardous failure to detect hypoor hyperglycemia events, albeit with a slightly lower percentage in the D region.

Assessing the impact of food macronutrients on BGL prediction
Meal is considered as one of the key contributors to blood glucose.It is expected that meal intake will increase BGL.Meal information should help predicting the BGL better.As discussed in Subsection 5.2, the impact of meal intake in BGL prediction is moderate in our experiments.We have computed the MI scores of meal macronutrients as displayed in Fig. 6.To investigate further, we have run experiments targeting the meal macronutrient values: CHO, protein, fat, fiber, GL, and calorie.In the feature set for each experiment, in addition to glucose, we have used the meal macronutrients.The performance result is given in Table 6.In one experiment (last row of the table), all the macronutrients are present, and in all other experiments one of the meal macronutrients is absent.For example, in the first row, only the CHO is absent in the feature set, as mentioned by.
"except CHO" in the parentheses.We observe that the best performance is achieved when all the macronutrients are present.Fiber is observed as the only macronutrient with negligible contribution to the performance.For all other macronutrients, the performances are accordingly.That means, the rest of the macronutrients contribute significantly to the prediction.

Statistical significance assessment of BGL prediction models
Regardless of the chosen model, on average the best contributing feature group is Glucose, Meal, Exercise, Medicine, and Profile.For this group of features, we have conducted significance tests to see if one model performs significantly better than the others.In our study, we have found that.the paired t-test is appropriate for comparing two models, such as random forest and CNN.This is because the predictions made by these models can be thought of as paired observations.To ensure the reliability of our comparison, we have first checked if the differences between the paired observations followed a normal distribution.We have used the Shapiro-Wilk test for this purpose.Additionally, we have visually inspected the distribution by creating a histogram.Once we have confirmed that the differences were approximately normally distributed, we have proceeded to perform paired t-tests between the pairs of models.After conducting these tests, we have observed that there is no statistically significant difference in the prediction performance of the random forest, CNN, and XGBoost models.The other two SVR and MLP do have statistically significant difference in the prediction performance.

Fitness assessment of BGL prediction models
The R-squared performance metric serves as a crucial evaluation tool alongside RMSE and CEG, offering insights into the goodness of fit between a model and the sample data.It measures the proportion of the variance in the dependent variable that is predictable from the independent vari-ables.Additionally, adjusted R-squared is computed to enhance the model's generalization capability, mitigating potential overfitting issues.Upon scrutinizing the adjusted R-squared results, presented in Table 7, it becomes evident that the CNN model exhibits the highest level of compatibility with the dataset.While the CNN performance may marginally lag behind the other four methods, the adjusted Rsquared outcomes suggest that it is exceptionally well-suited to capture the underlying patterns within the data.Consequently, despite its slightly inferior performance in comparison to other methods, the CNN model emerges as a favorable choice, emphasizing its robust fit and potential for effective predictions.
The scarcity of studies reporting adjusted R-squared values for blood glucose prediction within a 2-h horizon adds significance to our findings.In one of these limited studies, Bock et al. [23] noted an R-squared value of 0.72 for their model.In comparison, our CNN model has demonstrated a superior R-squared of 1, as detailed in Table 7.While acknowledging the slightly less favorable performance than other models, this study highlights the CNN model's prowess in achieving superior fit and good predictive accuracy.

Impact of PCA-based feature selection on model performance and efficiency
We have used PCA to select a few features from a lot of features to simplify the model.The outcomes, presented in Table 7, indicate a notable reduction in training time for XGB and SVR, a moderate decrease for CNN and RF, and minimal impact for MLP.However, Fig. 6.Relationship between the meal macronutrients and BGL.

Table 6
Influence of meal macronutrients reflected by the RMSE values.All feature sets include the BGL measured before meal intake.They (except the last row) also include all the meal macronutrients except one as mentioned in the parentheses.prediction times have slightly increased.Evaluation metrics such as RMSE, CEG, R-squared, and adjusted R-squared have experienced a slight degradation, aligning with our expectations.Consequently, the employment of PCA did not yield significant computational efficiency gains for all the models, especially for our best performing random forest model.

Generalizability of the models across different data-distributions
In this subsection, we have run experiments to present the generalizability of the proposed data-driven models.In these experiments, training and testing data comes from different distributions, i.e., one user's data is used for testing and other users' data are used for training.The results are listed in Table 8.In our collected dataset, we have five subjects' data.In the model training, data of four subjects are used as training data with Cross Validation (CV).The results (RMSE values) are given in the -Train columns.For model testing, the remaining subject's data is used as the test data.The results (RMSE values) are given in the -Test columns.The RMSE values of CNN, and MLP models prediction result on test data are below satisfactory.This is due to the small size of the dataset, and non-linear nature of the data.SVR, random forest, and XGB models have reasonably good RMSE values.Hence, these three models can generalize well.RF model has the best performance as given in Table 8.We observe that for RF the test results are closest to the cross validation results.

Prediction performance analysis of the personalized models
Personalized models have been generated by applying the modeling techniques on individual patient data.The results are given in Table 9.The models trained with data of patient ids 10 and 12 observe best results with low average RMSE values of 19.53 mg/dL and 25.85 mg/dL.This is due to the fact that the patient with Id 10 has been recently diagnosed with diabetes.The patient with Id 12 is prediabetic.The differences between their before meal and after meal BGLs have low variance.On the other hand, other patients have a long history of diabetes.The differences between their before meal and after meal BGLs have high variance.

Identifying the optimal training-testing data split
We have run experiments with various split percentages for training and testing data.They are listed in Table 10.With the increase of training data, performances of the models improve, and the RMSE decreases.This happens up to the split percentage of training: 70.0%, testing: 30.0%.After that, with an increase in training data, the performance gets worse.This is due to the overfitting of the data by the models.Therefore, training: 70.0%, testing: 30.0%data split brings the best performance.This result from our discrete value dataset matches the already existing consensus in the research community.

Challenges in finding correlations between single features and BGL
Pearson coefficients are computed from raw data between individual input feature and 2 h after meal BGL as given in Table 11.The macronutrients and other characteristics (CHO, protein, fat, fiber, calorie, and GL) of food contribute differently to the BGL.BGL increases with an increase of CHO.Body takes more time to break down protein.And protein doesn't contribute directly to BGL.Hence, with more protein BGL may be less within a certain time.Similarly, due to slow digestion, BGL may be less with an increase of fiber in food within a certain time.Though fat does not have any direct influence on BGL, high fat causes less digestion.In turn this makes insulin become less effective.This might cause BGL to increase with fat increase.In addition to food macronutrients, physical exercise, medication, and many other factors influence BGL.Therefore, it would be difficult to see the one to one relationship as mentioned above.That is what is observed from the pearson coefficients of different subjects (with IDs 7,9,10,11,and 12)   example, for the patient with ID 10 (see the column with ID 10), for exercise duration, we observe a positive correlation of.0.234 though theoretically it has a negative correlation with BGL; for CHO, we observe a negative correlation of − 0.192 though theoretically it has a positive correlation with BGL; etc.

Simplicity and efficiency analysis of the BGL prediction models
The computational and model complexities of the models employed in this study are detailed in Table 12, with their corresponding hyperparameters provided in subsection 4.2.All of our models demonstrate simplicity, as evident from both the training and prediction times presented in the ta-ble.These computations are executed on a laptop PC equipped with an Intel Core i5-9300H CPU.@ 2.40 GHz and 24.0 GB of installed RAM.Notably, CNN stands out as one of the slower models, requiring just over a minute for training and 387 ms for inference.Despite this, their per-formance aligns with existing works [6,61] focusing on online model retraining, and with previous studies [6,7,38,[61][62][63] focusing on online prediction.This ensures their suitability for deployment on smart devices for predictive purposes.

Discussion
In this section, we have presented the main outcomes of this study, outlined the study's limitations, and potential directions for future research.These aspects are described in detail below.

Key findings
In this study, we observed that different input features exert varying influences on Blood Glucose Level (BGL).The analysis explores individual features, their combinations, and meal macronutrients.Mutual Information (MI) is computed to measure the relationship between each feature and the tar-get variable.Profile features show a strong relationship, while calorie and exercise duration exhibit a    [11,16] The study acknowledges the challenges associated with predicting blood glucose levels over longer inter-vals, contrasting with existing research that often focuses on shorter-term horizons.Paired t-tests are conducted to compare model performances, particularly between random forest, CNN, and XGBoost.The analysis confirms no statistically significant difference among these models.However, SVR and MLP exhibit significant differences in prediction performance.Additionally, Cumulative Error Grid (CEG) prediction percentages highlight the competitive performance of random forest and XGBoost models.
Principal Component Analysis (PCA) is employed to simplify models, with outcomes indicating reduced training time for some models, despite a slight increase in prediction times and minor degra-dation in evaluation metrics (RMSE, CEG, R-squared, adjusted Rsquared).Model fitness is assessed using R-squared values.The CNN model exhibits the highest compatibility, emphasizing its robust fit and potential for effective predictions within the given context.
Experiments on model generalizability, where training and testing data come from different dis-tributions, reveal reasonable performance for SVR, random forest, and XGB models.Random forest demonstrates the best performance in this context.Personalized models, generated from individual patient data, show varying performance.Patients with recent diabetes diagnoses or prediabetic conditions exhibit lower variance in before and after meal BGLs, resulting in better model performance.The computational and model complexities of the employed models are detailed.Despite the slower training time of the CNN and MLP models, all models demonstrate simplicity and suitability for deployment on smart devices for predictive purposes.

Limitations and future works
From the results, we can see that our best generalized models provide a state of the art or near state of the art performances.Yet, they have the following limitations.

Small Sample Size:
The limited dataset, comprising information from only 5 patients, limits the generalizability of the developed models.To enhance robustness and applicability across a broader population, acquiring more data from diverse patients is crucial.2. Limited Clinical Interpretability: The use of black-box machine learning models introduces a limitation in terms of clinical interpretability.Their adoption in clinical settings where inter-pretability is crucial can be challenging.Future efforts should aim to improve the transparency of the models.3. Prospective Real-World Testing: While the study provides insights into model performance based on historical data, a limitation lies in the absence of prospective testing in real-world scenarios.Evaluating the models in real-time and real-world conditions is essential to validate their practi-cal utility and reliability in diverse clinical settings.Future research should prioritize prospective testing to ensure the models' effectiveness in real-world applications.
Other future research directions can include the integration of additional regression predictive models, expanding the set of input features, implementing data augmentation techniques to address data scarcity issues, offering confidence levels alongside blood glucose level values, and further refining feature engineering approaches.

Conclusions
Two hours after meal BGL prediction is crucial for T2DM patients as it guides a patient to follow a healthy diet or to avoid any adverse glucose situation.We have made several contributions in this regard.Firstly, we have introduced a 2 h after meal BGL prediction framework from discrete BGL readings of T2DM patients and other parameters such as food intake, medicine, exercise, etc using data-driven models.To the best of our knowledge, this is the first work using data-driven models to demonstrate that discrete BGL values are able to provide a state of the art performance in predicting BGL values.Secondly, we have meticulously designed and collected a unique dataset encompassing discrete BGL values (two per day, one before meal, and one 2 h after meal), medication doses, meal images, physical activities, and profile from five T2DM patients over a six-month period.Thirdly, we have conducted a comprehensive evaluation of several traditional ML regression and Deep Learning techniques using the popular metrics RMSE, CEG, Rsquared, and adjusted R-squared values.The random forest model has exhibited the best overall performance, closely trailed by the CNN model.Additionally, we have analyzed the scale of contributions of the input features, the generalization capability of the different data-driven models, and the performances of the personalized models.While the generalized models exhibited reasonable performance across diverse user distributions, personalized models showcased varied results, particularly excelling in cases of recently diagnosed and prediabetic patients.Though some results are not conclusive, we believe that the use of discrete BGL values is promising in predicting 2 h after meal BGLs.We have acknowledged a few limitations and suggested potential avenues for future research of our proposed framework.We emphasize that more research works need to focus on BGL prediction using discrete data from a large cohort of T2DM patients.In conclusion, our work not only addresses a critical aspect of T2DM management but also con-tributes novel insights into the potential of discrete BGL values and data-driven models.The path forward involves addressing identified limitations and expanding the scope of research to encompass a more extensive cohort, advancing our understanding of predictive modeling in the context of post-meal BGLs for improved diabetes management.

Fig. 3 .
Fig. 3.The working principles of the traditional machine learning models used in this study.(a) The random forest method, (b) The XGB method, (c) The SVR method.

Fig. 4 .
Fig. 4. The working principles of the Deep Learning models used in this study.(a) The MLP model, (b) The CNN model.

Table 2
User profiles of the dataset.

Table 3
One sample data of input features.

Table 4
Prediction performances of experiments using incremental input feature combinations in terms of average RMSE values.

Table 5
CEG prediction percentages for the top performing feature group for 2-hr PH.

Table 7
Performances of different models with and without using PCA.All the available 23 features are used.Notations -Adj R2: Adjusted R2, Timet: training time in seconds, Timep: prediction time in seconds.These experiments are run on Google Colab using GPU.

Table 8
in Table11.For Experiments on measuring the generalizability of the models.Model performances are measured in RMSE values.

Table 9
Personalized model performances (evaluated in RMSE values).

Table 10
Model performances of the train-Test split experiments measured in RMSE values.

Table 11
Pearson coefficients between BGL contributors and 2 h after meal BGL.

Table 12
Computational and model complexity of different models used in this study.Timetrain and Timeprediction are training and prediction time in seconds.Timetrain and Timeprediction complexities given below are simplified approximations based on the common operations of the methods.Notations -n: number of samples in the training data, f: number of features/channels, t: number of trees, m: avg.# of neurons in each layer, l: # of layers, h: height of input channel, w: width of input channel, c: # of filters, k: size of Conv kernel, P: parameters, SV: Support Vectors, BR: Boosting Rounds.connection with BGL.Meal macronutrients, especially when combined with glucose, sig-nificantly contribute to BGL prediction.Experiments targeting macronutrient values reveal that the presence of all macronutrients leads to the best performance, with fiber being the only macronutrient showing negligible contribution.Various input feature combinations are assessed for optimal performance.The results, displayed in Table4, reveal the effectiveness of different models.Random forest demonstrates the best performance with an average Root Mean Square Error (RMSE) of 40.15 mg/ dL.Incrementally adding features im-proves performance, and the best results are achieved with the combination of glucose, meal, exercise, medicine, and profile, employing the random forest model with an RMSE value of 31.87 mg/dL.Signif-icantly, this accomplishment closely aligns with the top-performing research highlighted in Refs. moderate